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Abstract 



We compute the probability of satisfiability of a class of random Horn-SAT formulae, motivated by a connection 
i— i ' with the nonemptiness problem of finite tree automata. In particular, when the maximum clause length is 3, this 
model displays a curve in its parameter space along which the probability of satisfiability is discontinuous, ending 
\ in a second-order phase transition where it becomes continuous. This is the first case in which a phase transition 
' of this type has been rigorously established for a random constraint satisfaction problem. 

a: 

1 Introduction 

In the past decade, phase transitions, or sharp thresholds, have been studied intensively in combinatorial prob- 
^ \ lems. Although the idea of thresholds in a combinatorial context was introduced as early as 1960 [15], in recent 
O ' years it has been a major subject of research in the communities of theoretical computer science, artificial intelli- 
^ gence and statistical physics. Phase transitions have been observed in numerous combinatorial problems in which 
^ ' they the probability of satisfiability jumps from 1 to when the density of constraints exceeds a critical threshold. 

The problem at the center of this research is, of course, 3-SAT. An instance of 3-SAT is a Boolean formula, 
. consisting of a conjunction of clauses, where each clause is a disjunction of three literals. The goal is to find 
"£3 \ a truth assignment that satisfies all the clauses and thus the entire formula. The density of a 3-SAT instance is 
£h ' the ratio of the number of clauses to the number of variables. We call the number of variables the size of the 
•• ■ instance. Experimental studies ll9l l28l l29l show a dramatic shift in the probability of satisfiability of random 3- 
. i-H ! SAT instances, from 1 to 0, located at a critical density r c 4.26. However, in spite of compelling arguments from 
^ ' statistical physics l25ll26ll . and rigorous upper and lower bounds on the threshold if it exists flT1[T8ll23ll . there 
■ is still no mathematical proof that a phase transition takes place at that density. For a view variants of SAT the 
existence and location of phase transitions have been established rigorously, in particular for 2-SAT, 3-XORSAT, 
and 1-in-fc SAT (3 El El GO [12. 



In this paper we prove the existence of a more elaborate type of phase transition, where a curve of discontinuities 
in a two-dimensional parameter space ends in a second-order transition where the probability of satisfiability 
becomes continuous. We focus on a particular variant of 3-SAT, namely Horn-SAT. A Horn clause is a disjunction 
of literals of which at most one is positive, and a Horn-SAT formula is a conjunction of Horn clauses. Unlike 
3-SAT, Horn-SAT is a tractable problem; the complexity of the Horn-SAT is linear in the size of the formula [10]. 
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This tractability allows one to study random Horn-SAT formulae for much larger input sizes that we can achieve 
using complete algorithms for 3-SAT 

An additional motivation for studying random Horn-SAT comes from the fact that Horn formulae are connected 
to several other areas of computer science and mathematics l24l . In particular, Horn formulae are connected to 
automata theory, as the transition relation, the starting state, and the set of final states of an automaton can be 
described using Horn clauses. For example, if we consider automata on binary trees, then Horn clauses of length 
three can be used to describe its transition relation, while Horn clauses of length one can describe the starting state 
and the set of the final states of the automaton, (we elaborate on this below). Then the question of the emptiness 
of the language of the automaton can be translated to a question about the satisfiability of the formula. Since 
automata- theoretic techniques have recently been applied in automated reasoning I301I3T1 . the behavior of random 
Horn formulae might shed light on these applications. 

Threshold properties of random Horn-SAT problems have recently been actively studied. The probability of 
satisfiability of random Horn formulae in a van'aMe-clause-lengfh model was fully characterized in [20] I2T1 . 
where it was shown that random Horn formulae have a coarse rather than a sharp satisfiability threshold, meaning 
that the problem does not have a phase transition in this model. The variable-clause-length model used there is 
ideally suited to studying Horn formulae in connection with knowledge-based systems [24|. Bench-Capon and 
Dunne [4| studied aykeJ-clause-length model, in which each Horn clause has precisely k literals, and proved a 
sharp threshold with respect to assignments that have at least k — 1 variables assigned to be true. 

Motivated by the connection between the automata emptiness problem and Horn satisfiability, Demopoulos 
and Vardi [ 14 1 studied the satisfiability of two types of fixed-clause-length random Horn-SAT formulae. They 
considered 1-2-Horn-SAT, where formulae consist of clauses of length one and two only, and 1-3-Horn-SAT, 
where formulae consist of clauses of length one and three only. These two classes can be viewed as the Horn 
analogue of 2-SAT and 3-SAT. For 1-2-Horn-SAT, they showed experimentally that there is a coarse transition 
(see Figure El, which can be explained and analyzed in terms of random digraph connectivity [22|. The situation 
for of 1-3-Horn-SAT is less clear cut. On one hand, recent results on random undirected hypergraphs [ 13 1 fit the 
experimental data of [ 14 1 quite well. On the other, a scaling analysis of the data suggested that transition between 
the mostly-satisfiable and mostly-unsatisfiable regions (the "waterfall" in FigureQ is steep but continuous, rather 
than a step function. It was therefore not clear if the model exhibits a phase transition, in spite of experimental 
data for instances with tens of thousands of variables. 

In this paper we generalize the fixed-clause-length model of [ 14 ] and offer a complete analysis of the probability 
of satisfiability in this model. For a finite k > and a vector d of k nonnegative real numbers d\, d®, ■ ■ ■ , d^, 
d\ < 1, let the random Horn-SAT formula H% d be the conjunction of 

• a single negative literal x\, 

• d^n positive literals chosen uniformly without replacement from X2, ■ ■ • , x n , and 

• for each 2 < j < k, djn clauses chosen uniformly from the possible Horn clauses with j variables 
where one literal is positive. 

Thus, the classes studied in fT4l are d d2 and d d ^ respectively. 

With this model in hand, we settle the question of sharp thresholds for 1-3-Horn-SAT. In particular, we show 
that there are sharp thresholds in some regions of the (d\, d^) plane in the probability of satisfiability, although not 
from 1 to 0. We start with the following general result for the d model. 

Theorem 1.1 Let to be the smallest positive root of the equation 




(1) 




Call to simple if it is not a double root of 0, i-e., if the derivative of the left-hand-side of with respect to t is 
nonzero at to- If to is simple, the probability that a random formula from d is satis fiable in the limit n — ► oo is 

$(d) := lim Pr[#i? d is satisfiable] = — . (2) 
n— >oc ' " 1 — d\ 

Specializing this result to the case k = 2 yields an exact formula that matches the experimental results in fffl : 

Proposition 1.2 The probability that a random formula from H 2 n d d ^ is satisfiable in the limit n — > oo is 

r 2 i W(-(l-d 1 )d 2 e- d2 ) 
${di,d 2 ) := lim Pr[H nd d is satisfiable] = — — '- . (3) 

ra-»oo ' 11 - (I — d\)d2 

Here W(-) is Lambert's function, defined as the principal root of the equation W(x)e w ( x ' = x. 

For the case k = 3 and d 2 = 0, we do not have a closed-form expression for the probability of satisfiability, 
though numerically Figure^ shows a very good fit to the experimental results of [ 14 1. In addition, we find an 
interesting phase transition behavior in the (di, d 3 ) plane, described by the following proposition. 

Proposition 1.3 The probability of satisfiability &(di,d 3 ) that a random formula from di ds is satisfiable is 
continuous for d 3 < 2 and discontinuous for d3 > 2. Its discontinuities are given by a curve T in the (di, d 3 ) 
plane described by the equation 

exp(i(^-^/^ ^ 2) 2 ) 

di = 1 ^ ; L . (4) 

d 3 -y/d 3 (d 3 -2) 

This curve consists of the points (d\, d 3 ) at which to is a double root of Q, and ends at the critical point 

(1 - v/e/2,2) = (0.1756..., 2) . 



The curve T of discontinuities described in Proposition II .31 can be seen in the right part of Figure^ The drop 
at the "waterfall" decreases as we approach the critical point (0.1756..., 2), where the probability of satisfiability 
becomes continuous (although its derivatives at that point are infinite). We can also see this in Figure |2j which 
shows a contour plot; the contours are bunched at the discontinuity, and "fan out" at the critical point. In both 
cases our calculates closely match the experimental results of [ 14 1. 




Figure 2. Contour plots. Left, the experimental results of |14|. Right, our analytical results. 

In statistical physics, we would say that T is a curve of first-order transitions, in which the order parameter 
is discontinuous, ending in a second-order transition at the tip of the curve, at which the order parameter is 
continuous, but has a discontinuous derivative (see e.g. O). A similar transition takes place in the Ising model, 
where the two parameters are the temperature T and the external field H; the magnetization is discontinuous at 
the line H = for T < T c where T c is the transition temperature, but there is a second-order transition at (T c , 0) 
and the magnetization is continuous for T > T c . 

To our knowledge, this is the first time that a continuous-discontinuous transition of this type has been estab- 
lished rigorously in a model of random constraint satisfaction problems. We note that a similar phenomenon is 
believed to take place for (2 + p)-SAT model at the triple point p = 2/5; here the order parameter is the size of 
the backbone, i.e., the number of variables that take fixed values in every truth assignment |3]|Sl- Indeed, in our 
model the probability of satisfiability is closely related to the size of the backbone, as we see below. 

2 Horn-SAT and Automata 

Our main motivation for studying the satisfiability of Horn formulae is the unusually rich type of phase transition 
described above, and the fact that its tractability allows us to perform experiments on formulae of very large size. 
However, the original motivation of fl4l that led to the present work is the fact that Horn formulae can be used to 
describe finite automata on words and trees. 

A finite automaton A is a 5-tuple A = (5, S, 5, s, F), where 5 is a finite set of states, £ is an alphabet, s is a 
starting state, F C S is the set of final (accepting) states and 5 is a transition relation. In a word automaton, S is a 
function from S x £ to 2 s , while in a binary-tree automaton 5 is a function from S x £ to 2 SxS . Intuitively, for 
word automata 5 provides a set of successor states, while for binary-tree automata 5 provides a set of successor 
state pairs. A run of an automaton on a word a = a±a2 ■ ■ ■ a n is a sequence of states so^i • • ■ s n such that sq = s 
and (sj_i, Oj, Sj) £ 5. A run is succesful if s n S F: in this case we say that A accepts the word a. A run of 
an automaton on a binary tree t labeled with letters from S is a binary tree r labeled with states from S such 
that root(r) = s and for a node i of t, (r(i), t(i), r(left-child-of-z), r(right-child-of-i)) G 5. Thus, each pair in 



Algorithm PUR: 


1. 


while (/ft contains positive unit clauses) 


z. 


choose a random positive unit clause x 


3. 


remove all other clauses in which x occurs positively in 


4. 


shorten all clauses in which x appears negatively 


5. 


label x as "implied" and call the algorithm recursively. 


6. 


if no contradictory clause was created 


7. 


accept (ft 


8. 


else 


9. 


reject (ft. 



Figure 3. Positive Unit Resolution. 

8(r(i), t(i)) is a possible labeling of the children of i. A run is succesful if for all leaves I of r, r(l) G F: in this 
case we say that A accepts the tree t. The language L(A) of a word automaton A is the set of all words a for 
which there is a successful run of A on a. Likewise, the language L(A) of a tree automaton A is the set of all trees 
t for which there is a successful run of A on t. An important question in automata theory, which is also of great 
practical importance in the field of formal verification 113011311 . is: given an automaton A, is L(A) non-empty? We 
now show how the problem of non-emptiness of automata languages translates to Horn satisfiability. Thus, getting 
a better understanding of the satisfiability of Horn formulae would tell us about the expected answer to automata 
nonemptiness problems. 

Consider first a word automaton A = (S, X, 5, so,F). Construct a Horn formula 4>a over the set S of variables 
as follows: create a clause (sq), for each G F create a clause (sj), for each element (si,a,Sj) of 5 create 
a clause (sj, Si), where (s«, • • • , Sk) represents the clause Sj V • • • V s& and fj is the negation of sj. Similarly 
to the word automata case, we can show how to construct a Horn formula from a binary-tree automaton. Let 
A = (S, S, 5, sq, F) be a binary-tree automaton. Then we can construct a Horn formula (ft a using the construction 
above with the only difference that since S in this case is a function from S x {a} to S x S, for each element 
(s^ a, Sj, Sk) of 5 we create a clause (s~j, s&, s«). 

Proposition 2.1 [ 14 1 Let A be a word or binary tree automaton and (ft a the Horn formula constructed as de- 
scribed above. Then L{A) is non-empty if and only if (ft a is unsatisfiable. 

3 Main Result 

Consider the positive unit resolution algorithm PUR applied to a random formula (ft (Figure |3j). The proof of 
Theorem 11.11 follows immediately from the following theorem, which establishes the size of the backbone of the 
formula with the single negative literal x\ removed: that is, the set of positive literals implied by the positive unit 
clauses and the clauses of length greater than 1. Then (ft is satisfiable as long as x\ is not in this backbone. 

Lemma 3.1 Let (ft be a random Horn-SAT formula H^ d with d\ > 0. Denote by to the smallest positive root of 
Equation Q, and suppose that to is simple. Then, for any e > 0, the number N^ n of implied positive literals, 
including the d\n initially positive literals, satisfies w.h.p. the inequality 

(t - e) ■ n < Ar din < (t + e) • n, (5) 

Proof: First, we give a heuristic argument, analogous to the branching process argument for the size of the giant 
component in a random graph. The number m of clauses of length j with a given literal x as their implicate (i.e., in 
which x appears positively) is Poisson-distributed with mean dj . If any of these clauses have the property that all 



their literals whose negations appear are implied, then x is implied as well. In addition, x is implied if it is one of 
the d\n initially positive literals. Therefore, the probability that x is not implied is the probability that it is not one 
of the initially positive literals, and that, for all j, for all m clauses c of length j with x as their implicate, at least 
one of the j — 1 literals whose negations appear in c is not implied. Assuming all these events are independent, if 
t is the fraction of literals that are implied, we have 

i-t=(i-don E^a-^r 

j=2 \m=0 ' / 

k I k 

= (1 - di) Yl expi-djt^ 1 )) = (1 - di) exp - ^ d^~ x 

5=2 V 3=2 

yielding Equation Q. 

To make this rigorous, we use a standard technique for proving results about threshold properties: analysis 
of algorithms via differential equations [32] (see [1| for a review). We analyze the while loop of PUR shown 
in Figure [3J specifically, we view PUR as working in stages, indexed by the number of literals that are labeled 
"implied." After T steps of this process, T variables are labeled as implied. At each stage the resulting formula 
consists of a set of Horn clauses of length j for 1 < j < k on the n — T unlabeled variables. Let the number 
of distinct clauses of length j in this formula be Sj (T) ; we rely on the fact that, at each stage T, conditioned on 
the values of Sj(T) the formula is uniformly random. This follows from an easy induction argument which is 
standard for problems of this type (see e.g. \Tt\). 

Now, the variables appearing in the clauses present at stage T are chosen uniformly from the n — T remaining 
variables, so the probability that the chosen variable x appears in a given clause of length j is j / (n — T), and the 
probability that a given clause of length j + 1 is shortened to one of length j (as opposed to removed) is j / (n — T). 
A newly shortened clause is distinct from all the others with probability 1 — o(l) unless j = 1, in which case it is 
distinct with probability (n — T — S\) j \n — T). Finally, each stage labels the variable in one of the S\{T) unit 
clauses as implied. Thus the expected effect of each step is 

Eft-CT + l)] = S J -Cn+ / ,+l( rc ) _"~/ j(r) +■>(!) forall2<j<t 

B[«(r + i)] = + 

Setting T = t • n and Sj(T) = Sj(t) • n, we rescale this to form a system of differential equations: 

dsj _ s j+1 (t) - Sj(t) 



dt J l-t 
dsi (l-t- Sl {t)\ ( s 2 (t) 



for all 2 < j < k 

■1 • (6) 



Then Wormald's Theorem tells us that, for any constant 5 > 0, for all t such that s\ > 5, w.h.p. we have 
Sj(t ■ n) = Sj(t) ■ n + o(n) where Sj(t) is the solution to the system ©. With the initial conditions Sj(0) = dj 
for 1 < j < k, a little work shows that this solution is 



Sj (t) 



(1 - t) j ^ ^ d e t e - j for all 2 < j < k 



=3 

k 



Sl (t) = l- t -(l-d 1 )e W [-Y J d^- 1 \ . (7) 

J'=2 



Note that si(t) is continuous, si(0) = d\ > 0, and si(l) < since d\ < 1. Thus si(t) has at least one root in 
the interval (0, 1). Since PUR halts when there are no unit clauses, i.e., when S\{T) = 0, we expect the stage at 
which it halts. Thus the number of implied positive literals, to be T = t^n + o(n) where to is the smallest positive 
root of s\{t) = 0, or equivalently, dividing by 1 — d\ and taking the logarithm, of Equation Q. 

However, the conditions of Wormald's theorem do not allow us to run the differential equations all the way up 
to stage ton. We therefore choose small constants e, S > such that si(io — e) = 5 and run the algorithm until 
stage (to — e)n. At this point (to — e)n literals are already implied, proving the lower bound of 

To prove the upper bound of (J5J, recall that by assumption to is a simple root of Q, i.e., the second derivative 
of the left-hand size of (Q with respect to t is nonzero at to- It is easy to see that this is equivalent to dsi/dt < 
at to- Since dsi/dt is analytic, there is a constant c > such that dsi/dt < for all to — c < t < to + c. 
Set e < c; the number of literals implied during these stages is bounded above by a subcritical branching process 
whose initial population is w.h.p. 5n + o(n), and by standard arguments we can bound its total progeny to be e'n 
for any e' > by taking 5 small enough. □ 

It is easy to see that the backbone of implied positive literals is a uniformly random subset of {x\, . . . , x n } of 
size iVd n . Since x\ is guaranteed to not be among the d\n initially positive literals, the probability that x\ is not 
in this backbone is 

n - N A ^ n 
(l-di)-n' 

By completeness of positive unit resolution for Horn satisfiability, this is precisely the probability that the is 
satisnable. Applying Lemma l3~Tl and taking e — > proves equation © and completes the proof of Theorem ll.il 

We make several observations. First, if we set k = 2 and take the limit d\ — > 0, Theorem 13. 1 I recovers Karp's 
result ll22l on the size of the reachable component of a random directed graph with mean out-degree d = d%, the 
root of ln(l - t) +dt = 0. 

Secondly, as we will see below, the condition that to is simple is essential. Indeed, for the 1-3-Horn-SAT model 
studied in [14|, the curve T of discontinuities, where the probability of satisfiability drops in the "waterfall" of 
Figure^ consists exactly of those (d\, d%) where to is a double root, which implies dsi/dt = at to- 

Finally, we note that Theorem 13. II is very similar to Darling and Norris's work on identifiability in random 
undirected hypergraphs fTTI. where the number of hyperedges of length j is Poisson-distributed with mean j3j. 
Their result reads 

k 

ln(l - t) + Y,jfr tj ~ l = ' 

j=l 

We can make this match (Q as follows. First, since each hyperedge of length j corresponds to j Horn clauses, we 
set dj = jPj for all j > 2. Then, since edges are chosen with replacement in their model, the expected number of 
distinct clauses of length 1 (i.e., positive literals) is d\n where d\ = 1 — e - ^ 1 . 

4 Application to H% d 

For H% d , Equation Q can rewritten as 1 — t = (1 — d\) • e~ d2 ' 1 \ Denoting y = di{t — 1), this implies 

y . e y = d 2 (t - 1) • = _rf 2 (l _ dx ) . e ~d2-t . e «fa.(t-l) = _ (1 _ dl ) d2e -d2_ 

Solving this yields 

t = l + ^-^(-(l-(ii) ( i 2 e- d2 ) 

d 2 v ' 

and substituting this into © proves Equation © and Proposition 11.21 In Figure |4] we plot the probability of 
satisfiability d 2 ) as a function of d 2 for d\ = 0.1 and compare it with the experimental results of 1141 : the 
agreement is excellent. 



P r S AT satisfiability plot lor random 1-2-HomSATIor several order values hetween 500 and 32000. where 0^=0.1 




Figure 4. The probability of satisfiability for 1-2-Horn formulae as a function of d 2 , where d\ = 0.1. 
Left, our analytic results; right, the experimental data of |14|. 



5 A continuous-discontinuous phase transition for H% dl d3 

For the random model di Q d ^ studied in [ 14 1, an analytic solution analogous to © does not seem to exist. 
Let us, however, rewrite Q as 

1 -* = /(*) := (1 - di)e- d3 * 2 . (8) 

We claim that for some values of d\ and d% there is a phase transition in the roots of ©. For instance, consider 
the plot of fit) shown in Figure|5]for d\ = 0.1 and d$ = 3. Here f(t) is tangent to 1 — t, so there is a bifurcation 
as we vary either parameter; for ^3 = 2.9, for instance, f(t) crosses 1 — t three times and there is a root of © at 
t = 0.185, but for CZ3 = 3.1 the unique root is at t = 0.943. This causes the probability of satisfiability to jump 
discontinuously but from 0.905 to 0.064. The set of pairs (d±, d^) for which f(t) is tangent to 1 — t is exactly the 
curve T on which the smallest positive root to of Q or © is a double root, giving the "waterfall" of FigurefTJ 



Pr[SAT] 




Figure 5. Left, the function f(t) of © when d\ = 0.1 and d 3 = 3. Right, the probability of satisfiability 
<£>(di,d 3 ) with d\ equal to 0.15 (continuous), 0.1756 (critical), and 0.2 (discontinuous). 



To find where this transition takes place, we set /' = —1, yielding f(t) = \/{2d%t). Setting this to 1 — t and 
solving for t gives 

* = 1 -*S< <9) 

where 



t= H 1_ v 1_ l) (10) 

Substituting (flOt in ^ and simplifying gives Q, proving Proposition II .31 

The fact that d\ is only real for <i 3 > 2 explains why T ends at d% = 2. At this extreme case we have 

dr = l-4 - 0.1756 and = -f . 

2 acs 3 8 

What happens for c?3 < 2? In this case, there are no real t for which f'(t) = —1, so the kind of tangency 
displayed in Figure [5]cannot happen. In that case, © (and equivalently Q) has a unique solution t, which varies 
continuously with d\ and 1^3, and therefore the probability of satisfiability ^(di,d^) is continuous as well. To 
illustrate this, in the right part of Figure |5] we plot ^3) as a function of ds with three values of d\. For 

d\ = 0.15, is continuous; for d\ = 0.2, it is discontinuous; and for d\ = 0.1756..., the critical value at the 
second-order transition, it is continuous but has an infinite derivative at d^ = 2. 
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